Change from TextFileReader to ParquetStreamReader by JanWillruth · Pull Request #348 · glamod/cdm_reader_mapper

JanWillruth · 2026-01-15T13:25:15Z

To do

Ideas

write a decorator so that we can call a function in the "normal" way, but the decorator decides whether to execute the function or pass it to common.iterators.process_disk_backed
- see _apply_or_chunk but as a decorator
- e.g. both is valid my_func(DataFrame, *args, **kwargs) and my_func(ParquetStreamReader, *args, **kwargs)

Issues

This PR addresses opened issues:

…ng of files larger than RAM when chunk_size is specified; Rework affected Databundle code

github-actions · 2026-01-15T13:25:31Z

Warning
This Pull Request is coming from a fork and must be manually tagged approved
in order to perform additional testing.

…ames; Remove unneeded TextFileReader tests form test_pandas.py

ludwiglierhammer · 2026-01-28T16:25:23Z

@JanWillruth: I made some high performance tests. This PR does not really affect the maximum memory usage, but speeds up the code a liitle bit. Nevertheless, we should use this PR your readability reasons.

Do you want to move the ParquetStreamReader to common. Then, we could have the ParquetStreamReader at one general place and could make use of it in common.select, mdf_reader and cdm_mapper.

ludwiglierhammer · 2026-01-29T10:47:54Z

I merged #360 into the main branch. Please resolve the merge conflicts and we can focus on this PR again.

# Conflicts: # cdm_reader_mapper/mdf_reader/utils/utilities.py # tests/test_reader_utilities.py

…etStreamReader

ludwiglierhammer · 2026-01-30T06:44:06Z

Hi @jtsiddons, what do you think about this PR? Some ideas for improvements or generel comments?

jtsiddons · 2026-01-30T06:58:07Z

Hi @jtsiddons, what do you think about this PR? Some ideas for improvements or generel comments?

Thanks @ludwiglierhammer - I've scheduled some time to have a look this afternoon

cdm_reader_mapper/mdf_reader/utils/utilities.py

ludwiglierhammer · 2026-02-13T11:53:02Z

Hi @jtsiddons, we could replace all TextFileReader elements with the new ParquetStreamReader. Do you have any further suggestions for this PR. We would appreciate your review.

… and Iterable of pd.DataFrame

Change from TextFileReader to ParquetStreamReader for (better) handli…

c09a06f

…ng of files larger than RAM when chunk_size is specified; Rework affected Databundle code

JanWillruth requested a review from ludwiglierhammer January 15, 2026 13:25

github-actions bot added the mdf_reader label Jan 15, 2026

Save columns schemas alongside parquet to restore MultiIndex column n…

c266a4d

…ames; Remove unneeded TextFileReader tests form test_pandas.py

ludwiglierhammer mentioned this pull request Jan 28, 2026

Speed up some functions #360

Merged

2 tasks

JanWillruth and others added 3 commits January 29, 2026 11:57

Merge remote-tracking branch 'origin/main' into reader_io

8588068

# Conflicts: # cdm_reader_mapper/mdf_reader/utils/utilities.py # tests/test_reader_utilities.py

try to use ParquetStremReader

a6e7b86

re-add make_copy for TextFileReader objects

858b4ca

github-actions bot added the common label Jan 29, 2026

ludwiglierhammer added 2 commits January 29, 2026 15:01

test_mdf_reader:test_read_data_textfilereader TextFileReader -> Parqu…

f223253

…etStreamReader

ParquetStreamReader to cdm_reader_mapper.common.iterators

691444d

ludwiglierhammer mentioned this pull request Jan 30, 2026

TextFileReader needed? #8

Open

ludwiglierhammer mentioned this pull request Jan 30, 2026

delete code blocks containing TextFileReader #23

Closed

ludwiglierhammer reviewed Jan 30, 2026

View reviewed changes

cdm_reader_mapper/mdf_reader/utils/utilities.py Outdated Show resolved Hide resolved

This was referenced Jan 30, 2026

add reading and writing of parquet and feather data #363

Merged

re-work testing suite #365

Open

ludwiglierhammer and others added 7 commits February 6, 2026 08:08

Update cdm_reader_mapper/mdf_reader/utils/utilities.py

4d49bde

remove unused variable

ae65dff

explicitly set data types

914a725

use common.iterators.process_disk_backed

29f789d

new function: common.iterators.is_valid_iterable

88b1601

make common.iterators.process_disk_backed run with Iterable[pd-Series]

954be46

use common.iterators.process_disk_backed in metmetpy.validate

5416ea8

github-actions bot added the metmetpy label Feb 6, 2026

optionally aggregate non data outputs

0e80b6e

ludwiglierhammer linked an issue Feb 10, 2026 that may be closed by this pull request

Refactor logic to handle chunking outside of reader(/mapper) for readability/maintenance #349

Open

ludwiglierhammer added 2 commits February 11, 2026 11:53

run databundel with common.iterators

f160e29

cdm_mapper.mapper uses common.iterators

e2f92d1

github-actions bot added the cdm_mapper label Feb 11, 2026

ludwiglierhammer added 10 commits February 11, 2026 15:44

mdf_reader.utils now uses common.iterators

696a4dc

make core using comon.iterators

0151861

re-work indexing

85ac14d

new ParquetStream method: reset_index

172dba1

internally: reset_index from _split_df to _split_dispatch

417dba5

preserve indexes while parsing

98a103f

update tests

d0b1b08

reduce complexity of process_disk_backed

5403fff

remove TextFileReader references

8e6f20f

__getattr__ mkaes real copies

c67f361

ludwiglierhammer and others added 11 commits February 17, 2026 12:22

delete print statement

3147b3f

use isinstance ParquetReaderStream

651e356

new postprocessing decorator to apply a function to both pd.DataFrame…

cf2f186

… and Iterable of pd.DataFrame

use postprocessing decorator I

c7837a2

use postprocessing decorator II

da40ff6

use postprocessing decorator III

9598308

introduce ProcessFunction class

eb8f185

add AI unit tests

2f33ec1

fixing pylint

a42fa76

Merge branch 'glamod:main' into reader_io

7dc7b2a

update CHANGELOG

fe3743f

github-actions bot added docs information labels Feb 19, 2026

ludwiglierhammer and others added 2 commits February 19, 2026 11:04

Merge branch 'main' into reader_io

ab539f6

fixing merge conflicts manually

1d66505

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change from TextFileReader to ParquetStreamReader#348

Change from TextFileReader to ParquetStreamReader#348
JanWillruth wants to merge 49 commits intoglamod:mainfrom
JanWillruth:reader_io

JanWillruth commented Jan 15, 2026 •

edited by ludwiglierhammer

Loading

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

ludwiglierhammer commented Jan 28, 2026

Uh oh!

ludwiglierhammer commented Jan 29, 2026

Uh oh!

ludwiglierhammer commented Jan 30, 2026

Uh oh!

jtsiddons commented Jan 30, 2026

Uh oh!

Uh oh!

ludwiglierhammer commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

JanWillruth commented Jan 15, 2026 • edited by ludwiglierhammer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To do

Ideas

Issues

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

ludwiglierhammer commented Jan 28, 2026

Uh oh!

ludwiglierhammer commented Jan 29, 2026

Uh oh!

ludwiglierhammer commented Jan 30, 2026

Uh oh!

jtsiddons commented Jan 30, 2026

Uh oh!

Uh oh!

ludwiglierhammer commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

JanWillruth commented Jan 15, 2026 •

edited by ludwiglierhammer

Loading

ludwiglierhammer commented Feb 13, 2026 •

edited

Loading